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Foreword 



This work was born from my desire to unify my two scientific back- 
grounds: physics and telecommunications. The study of Quantum In- 
formation Theory has given me the opportunity to complement these 
two subjects. The quantum theory predicts significant changes in our 
concept of computation and information. The conceptual jump from 
mathematical models to physical reality has outstanding consequences, 
such as new paradigm of complexity classes, in the case of computa- 
tion, which allows for solving problems believed to be in NP, such as the 
Factoring Problem or Discrete Logarithm Problem, in polynomial time. 

This thesis will be focused on the classical capacity of quantum chan- 
nels, one of the first areas treated by quantum information theorists. The 
problem is fairly solved since some years. Nevertheless, this work will 
give me a reason to introduce a consistent formalism of the quantum the- 
ory, as well as to review fundamental facts about quantum non-locality 
and how it can be used to enhance communication. Moreover, this re- 
flects my dwelling in the spirit of classical information theory, and it is 
intended to be a starting point towards a thorough study of how quan- 
tum technologies can help to shape the future of telecommunications. 

Whenever it was possible, heuristic reasonings were introduced in- 
stead of rigorous mathematical proofs. This finds an explanation in that 
I am a self-taught neophyte in the field, and just about every time I came 
across a new concept, physical arguments were always more compelling 
to me than just maths. The technical content of the thesis is twofold. 
On one hand, a quadratic classification based on optimization programs 
that I devised for distinguishing entangled states is presented in Chapter 
4. In second place, a less difficult yet I hope equally interesting technical 
part consists of versions of some proofs throughout the text. 
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Chapter 1 

A Mathematical Model for 
Communication 

Information Theory is mainly concerned about two issues. First one is 
to establish theoretical bounds to the achievable rates at which informa- 
tion can be compressed from a source and conveyed through a channel. 
To this goal, achievability and converse theorems for different communi- 
cation scenarios must be found. However, it is important to realize that 
these theorems are regardless of the complexity and delay of the codes 
that should attain the bounds. In second place. Information Theory is 
aimed at finding practical coding schemes that perform close to theo- 
retical limits. In this dissertation we will study exclusively first one of 
these two problems, to which Quantum Mechanics has endowed with a 
even richer variety of problems. 

Most of this chapter is based on the texts [1][2][3]. Since this chapter 
is a review of basic concepts, results about stochastic processes and 
typicality will not be proved. 

1.1 What is Information? 

Before starting maybe one should face the question ^'what is informa- 
tion?'' . How should this ubiquitous and quasi-philosophical process be 
described mathematically? It seems natural to define information in 
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terms of probability theory, for it is the mathematical framework that 
formally incorporates the concept of uncertainty about the future. 

In the first half of the past century, several meaningful definitions 

arose, such as Fisher's information (which is a measure of the curvature 
of the probability distribution) or Hartley's function (the logarithm of 
the source's alphabet size), in the context of statistics and engineering^. 

In 1948, guided by some reasonable assumptions. Shannon came out 
with entropy, H, as a measure for information.Among others, his re- 
quirements were that: 

1. H{p) be continuous on p (l"^p = 1) 

2. -ff(p) should be, for Pi = a. monotonic increasing function of n. 
This is equivalent to a normalization. 

3. If a choice is broken down into successive choices, the original 
entropy should be the weighted sum of individual values of the 
resulting entropies. 

It can be shown that the only function, up to a proportionality con- 
stant, satisfying these assumptions is^: 

n 

H{p) = -^Pilogpi (1.1) 
1=1 

In fact. Shannon's entropy is the epigone of deeper concepts such as 
the relative entropy (also known as Kullback-Leibler distance), or mutual 
information. The relative entropy of two probability distributions is 
given by: 

n 

^(p||q) = EP^l«g- (1-2) 

i=i 

with 1-^p = = 1, that is, p and q belong to the discrete probabil- 
ity simplex of dimension n. Although in general L)(p 1 1 q) ^ Z)(q||p), 

^More general and deeper concepts such as Renyi's entropy or Kolmogorov's al- 
gorithmic complexity and their far reaching implications are not discussed here for 
the sake of conciseness 

■^Throughout this thesis, logarithms will be taken in base 2. 
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this quantity can be thought of as "distance" between probability dis- 
tributions. 

Mutual information is the amount of information that a random 
variable contains about another random variable^. Consider X taking 
values in X and Y taking values in y, and let I{X; Y) denote I{p'^; q^), 
then their mutual information is: 

n m pXY 
i=l j=l 

where is the joint probability distribution of both random vari- 
ables. If they are independent, mutual information vanishes, which 
means that knowing the realization of one random variable does not 
give any clue about the other one. 

In turn. Shannon's entropy is a special case of mutual information, 
being the information that a random variable contains about itself, 
^(p"''') = I{X'-,X)- It will suffice to prove some properties of the rela- 
tive entropy, because they can be straightforwardly extended to mutual 
information and entropy. 



Theorem 1 [Nonnegativity of relative entropyjThe relative entropy is 
positive semidefinite, Z)(p||q) > 

Proof 1 Let A = Supp{p) be the support of p. Then 



-C(p||q) = -^pjlo^ 



Pi 



- -logVpi 
= - log ^ Qi 



^Note that throughout this text, we will often interchange random variables for 
their induced probability distributions, and viceversa 
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> - log ^ qi 

i=l 

= - log 1 

= (1.4) 



the first inequality is a consequence of Jensen's inequality for con- 
vex functions E[f{'p^)] > f{E[p-^]). The second inequality comes from 
extending the range of the sum. 

Theorem 2 [Convexity of relative entropyjThe relative entropy is a 
convex function of the probability distributions p and q 

Proof 2 By the log sum inequality J27=i ^ iY17=i YT'^ ^ 

[1], we have that: 



(Ap.+(i-A)pDiog f + ;;-j;ii < Apaog^+(i-Ay,iog 

Mi + (1 - \)q[ Xpi (1 - X)q'- 

(1.5) 

with A S [0, 1]. Summing over the index we get: 



D{p + (1 - A)p'||Aq + (1 - A)q') < XD{p\\q) + (1 - A)Z)(p'||q') (1.6) 

Corollary 1 [Concavity of entropy] Entropy is a concave function ofp^ 

Proofs Consider the uniform distribution = |j^(l, 1, 1). The 
relative entropy of distribution p^ with respect to is: 



n n 

D{p\\u) = ^pilogpi - ^pilogui = -H{p^) + log ||X 

1=1 i=l 
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so we get: 

H{p'') = log\\X\\-D{p\\u) (1.7) 

It is easy to see from this corollary that entropy is upper bounded 
by the logarithm of the cardinality of the alphabet H{p^) < log ||X||. 

A related important quantity is the conditional entropy of a random 
variable Y given that the instantiation of X is known, i.e. the residual 
uncertainty about Y once we learn about X. 

m 

H{Y\X = Xi) = (1-8) 

i=i 

Averaging over all possible outcomes of X: 

71 "M 777, 

HiY\X) = Y,pfH{Y\X = X,) = -EE^^^log'?!"' (1-9) 
1=1 1=1 j=i 

Clearly there is a reduction in the uncertainty only if there exist a 
non-factorizable joint probability distribution. In other words, if the two 
random variables are independent, then H{Y\X) = H{Y). By symmetry 
arguments one can easily find the relations: 

H{X) - H{X\Y) = I{X; Y) = H{Y) - H{Y\X) (1.10) 

I{X; Y) < miri{H{X), H{Y)} (1.11) 

One useful property which makes use of the conditional entropy is 
the chain rule for entropy. Let X, Y and Z be three random variables, 
then their joint entropy can be written: 

H{X, Y, Z) = H{X) + H{Y\X) + H{Z\X, Y) (1.12) 
which is easily generalizable to any number of random variables. 
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The convexity of relative entropy has an important consequence for 
channel coding, as we will see: 

Theorem 3 [Partial concavity of mutual information] For fixed p^'^, 
the mutual information is a concave function of 

Proof 4 From Bayes' rule: 



= (1-13) 

is a linear function of , thus H{p^) is a concave function of 
p^ . The mutual information can be expressed as: 



n m 

I{X; Y) = H{p^) ^ ' V log qj^"" (1.14) 

* j 

The second term is a linear function of p^ , hence, the whole expres- 
sion is concave on p^ . 



1.2 Simplest Scenario for Communication 

In the simplest case of information transfer, at least three stages can 
be identified: the source of information (or transmitter), the channel 
over which messages are sent, and the sink (or receiver). The source is 
modeled as a probability space (0,yLQ,/x). Typically, every outcome of 
the source will have to be processed in order to build a suitable message 
which can be sent over the channel. This is mathematically represented 
by a measurable function from the source's emitted messages to a given 
alphabet (usually a binary alphabet), and is practically called coding. 
Conversely, in order to transmit the original information to the sink, 
similar functions ought to be defined on the alphabet of the received 
messages to the original alphabet (on the assumption that transmitter 
and receiver share the same language). This involves statistical estima- 
tion and is called decoding. 
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The functions /e, gE, Id and go are measurable functions, so (W, ^1^) 
and (U,Ay) can be viewed as probabilizable spaces with probabihties 
induced by p{W = Wi) = p{fE^{wi) = u), and so on. This notion of in- 
herited probabihty is fundamental because relative entropy is a function 
defined on probability simplices. 



Ou(n) 



IC3 



W=fg(u) 



W(U1 

CO 



X=gj.(w) 





W(Y) 

0:3 








Channel decoder 




Figure 1.1: Simplest Scenario for Communication: In the sender-receiver 
scheme, the messages randomly emitted by the source are first com- 
pressed at the source encoder and then fed to the channel encoder. 
Channel encoder will map them to codewords resilient against channel 
noise, so that the original compressed message can be recovered reliably 
at the channel decoder. Source decoder will decompress the messages 
and deliver them to the receiver, or information sink 

Remarkably, the process of coding and decoding is absolutely deter- 
ministic. Choice is introduced at two levels, of different nature. First 
one is in the source itself, where the sample space could be whatever, 
i.e. all the thoughts of a person talking on the phone. A random vari- 
able U, defined on Q and taking values in U (||U|| = n) represents the 
physical resulting messages the that are emitted by the source, i.e. a 
series of phonemes which are a function of the thoughts of the person 
who talks. At a second stage, uncertainty is introduced in the channel, 
and is related to the noise (fading, interference, outages...) that every 
physical channel induces in an information carrier. In fact, a channel 
is represented by the tuple (X, Ty|x,y), where X and ^ are the input 
and output alphabets, respectively, and Ty\x^ is a stochastic transition 

generalized transition matrix would be of the form Tymixm-iym-i . Here will 
refer only to discrete memoryless channels without feedback. 
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matrix such that = Ty\xP'^ ■ 

Two main questions arise in this context, and that is all Information 
Theory is concerned about: 

Channel Capacity What is the maximum rate that can be achieved 
in sending information over a channel? This question is practically 
approached in the design of channel encoders-decoders. Later we 
shall see that 

R < maxJ(X;y) 

pX 

Error-correcting codes are mainly devoted to maximize this rate. 

Rate-Distortion Theory What is the minimum rate at which one 
source can be compressed (that is, eliminate redundant parts of 
the source's outputs) while keeping received messages below a dis- 
tortion threshold D? 

R > min I{U; U) 

pU.U;d(U\U)<D 

In the simplified case where the channel is noiseless, or whenever it 
is possible to estimate perfectly U (or just assume that d{U, U) = 
0), Rate-Distortion reduces to Lossless Data Compression: 

I{U; U) = I{U-U)=H{U) 

and the inequality becomes R > H{p^) 

There exist a nice duality between these two problems that can be 
appreciated when they are expressed as optimization programs [4] . 



1.3 Asymptotic Equipartition Property 

Typically sources will emit more than one output. Thus, we need to 
characterize them as stochastic processes rather than as just random 
variables. Consider a source described by {^,Aq,ii) and T : CI ^ CI 
which plays the role of a time shift in the sample space. This is a dynam- 
ical system and one can derive a stochastic process from it Uj{T^u) = 
Uj,w G An- 
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Figure 1.2: All information theory is concerned about: The rate of infor- 
mation transfer upper-bounded by the maximum capacity of inference 
of the receiver, which is related to the mutual information. On the lower 
part of the scheme we see that no rate is possible below those allowed 
for a given distortion threshold. For a discrete noiseless channel, the 
distortion can be taken to be zero and the lower bound reduces to the 
entropy of the source. 

If for all w E ^1^, we have that ^{T^uo) = fi{uj) = 1 or 0, then the 
source is ergodic and stationary, and Birkhoff's Theorem holds [3]: 




(1.15) 



where T = 1 denotes convergence with probability 1. If we now 
consider the sequence {Uj}'^^^ and regard log = — Y^^=i logp^^'^^ 

as a random variable itself^ ^, function of , then: 

^ m 

lim y logp^^l^'"' ^ ^[-logp^^l^'"'] = H{U) (1.16) 

H{\i) is the entropy rate of the stochastic process. It can be inter- 

in boldface denotes a probability distribution, while will denote the prob- 
ability of a particular occurrence of X, p{X = a;) 

^U-' , with upper index, is a shorthand for the sequence UiU2---Uj 
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preted as the entropy of the last random variable in the sequence given 
all its past. Finally we get: 

lim --logp^" ^iim) (1.17) 

m— >oo m 

Since convergence with probability 1 implies convergence in proba- 
bility, it is possible to write: 

?(| - ^logp^"* -//(IC)| < e) l,Ve (1.18) 

whence we obtain: 



2-m{H{U)-e) < ^[/- < 2-'"(^(")+^) 

Hence, for a fixed probability p^, the most likely sequences have 
an empirical entropy arbitrary close to the true entropy. Practically all 
probability mass will be localized at a proper subset of the set of all 
possible output sequences. This characteristic of the sequences, direct 
consequence of ergodicity, is called Asymptotic Equipartition Property 
because as m (the length of the sequence) grows, most likely sequences 
tend to be grouped in a proper subset called the Typical Set T^, whose 
cardinahty is 2"'^^^^~^ < ll'J'll < 2™^(")+^ and gather almost all proba- 
bihty (yCT) = 1 — e), whereas unlikely sequences tend to have a vanishing 
probability. Also, as m tends to infinity, all typical sequences become 
equally probable^. 

For simplicity, we will restrict ourselves to stationary, independent, 
identically distributed (i.i.d.) processes, in which case the entropy rate 
takes the form: 

H{U)= lim ^ ^- = H{p^) (1.20) 

m— »oo m 

which can be interpreted as the entropy per symbol of m random 
variables. Finally we come to a weak version of the Asymptotic Equipar- 
tition Property. : 

^This is in analogy with ensembles of statistical mechanics, where all points in 
phase space are assumed to be equally likely 



(1.19) 
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lim --logp^"' ^ H(p^) (1.21) 

m-+oo m 

now convergence is in probability. 

Similarly, it is possible to define jointly typical sequences (X"*, F™) 
with respect to a joint probability distribution p"^'^ as the sequences for 
which: 



lim -llogp^'"i:^iJ(p^) (1.22) 

m— »oo m 



lim -llogp^'"ii^iJ(p^) (1.23) 

m-»oo m 



lim -1 logp^'"^'" ^ lf(p^^) (1.24) 
As before, in the asymptotical limit, only typical pairs will take place: 

1.4 Shannon's Source and Channel Coding The- 
orems 

In his foundational paper [5] , Shannon laid the basements of Information 
Theory. He stated both problems above exposed (source and channel 
coding) and first offered a solution. For this, he used the concept of 
random, coding, which is not to be understood as random map between 
alphabets, but rather as a proof of existence of at least one coding scheme 
that attains the bound. However, his derivations were based on (weak) 
typicality and were only asymptotically optimal, therefore being of little 
interest until practical codes were found which performed close to the 
limit. 
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1.4.1 Source Coding 

Consider a source that generates a random sequence of outputs {Uj}^i, 
and an encoding function: 

with W ^ W = {1, 2, 2"*^}. Here message W is indexed by the 
instantiation of the sequence u™". The cardinaUty of W wiU be ||W|| = 
2mR^ where R is the rate of the code. Most commonly W C {0, 1}* and 
W will be a sequence of bits. W is the codification of the source. 

In order to quantify the fidelity of the code one should follow one of 
following criteria: 

• d{U,U) < e,Ve 

• T(i7 = ?7)>l-e,Ve 

We will use the second one, which is best suited for derivations based 
on weak typicality. 



OD-(n) 









W=fg(u"') 




U°=f-Q(W) 









Figure 1.3: Source Coding: The sequences emitted by the source will 
typically have a redundant part, due to possible correlations between 
symbols or strings. These redundant parts don't contain much informa- 
tion and it is desirable to get rid of them so that no so many channel uses 
are required to transmit the source. This redundancy is quantified by the 
entropy rate of the source, since sequences of length m are mapped (on 
average) to sequences of length mH{U), which will be typically shorter. 
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Theorem 4 [Lossless Source CodingJAn i.i.d. source {Uj}JLi can be 
reliably compressed (with vanishing probability of error) at a rate R if 
and only if R> H{p^). 

The proof is based on asymptotical expressions, so it will be optimal 
in the limit n — > oo 

Proof 5 (=^>Proof of AchievabiHty) An error occurs whenever one 
of the following events happen: 

• The sequence is not typical: Eq = T™} 

• A codeword is indexed by more than one typical sequence^: Ei = 

Using the independence bound, the error probability is: 



Pe < P{Eo} + P{Ei} 

Second inequality is obtained using the independence bound and av- 
eraging over all typical sequences. Also, P{Eq} — > as m grows. Third 

^This is a consequence of random coding: in choosing a code at random we risk 
of selecting a bad code. 
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inequality is obtained by enlarging the range of the sums. Note that, 
for equiprobable codewords, the likelihood that two are indexed by the 
same sequence is 2"*^. Last inequality follows from typicality arguments. 
Thus, in the asymptotic limit where m — > oo, if: 

R > H{p^) + e 
the probability of error vanishes. 

Proof 6 (<^Weak Converse) For codes with asymptotically vanishing 
probability of error, the rate must necessarily satisfy R > H{p^). To this 
aim, we will make use of Fane's inequality, which relates the probability 
of error to the conditional entropy of a sequence given its associated 
codeword X. R can be easily derived, so we don't prove it here: 

H{U"'\W) < mPelog \\U\\ + 1 = mem (1-26) 
as m grows. 

mR > H{p^) 

= liU'^iW) + HiWlU"") 

= /(C/™;!^) 

= H(p^^) - HiU^'lW) 

> mif(p^) — mejn 

First inequality comes from the upper bound of entropy. Since know- 
ing C/™ eliminates the uncertainty about W , we have the second equality. 
In the second inequality we have used Fanno's inequality. The source is 
modeled by an i.i.d process so H{p^"^) = mif(p^). 
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1.4.2 Channel Coding 

A channel is characterized by a the tuple (X, Ty|jf,y), where is Ty^x ^ 
map between the probability simplices corresponding to the input and 
output alphabet. While source coding is aimed at eliminating redundant 
parts of source's output (for this reason named data compression), the 
goal of channel coding is to introduce some redundancy in a controlled 
way, such that it helps to fight the errors induced by the channel, and 
is suitably called error-correction. 

Let gE be a channel encoding function: 



here W e W = {1, 2, 2"^^}. Each codeword X™ is indexed by a 
message as before, and usually C {0, 1}*. 

The capacity of a discrete memoryless channel without feedback is 
defined: 



C = max/(X;y) (1.27) 



and it is an upper bound on the attainable rates at which communi- 
cation can take place. 



Theorem 5 [Channel Coding] A channel (X, Ty|x,y) can be used to 
transmit information reliably if and only if R < C 



Proof 7 (=^Proof of Achievability) The probability of error, aver- 
aged over all possible codes Q, is: 
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W(U) 



CO 



X"=gE{W) 


X"(W) 


w=go(r) 


Channel encoder 




Channel decoder 



W(Y ) 



Figure 1.4: Channel Coding: At this stage, the compressed sequences 
are steered into larger sequences by means of introducing redundancy. 
The point is that, whereas the redundancy of the source's outputs was of 
little use, the overhead introduced by the channel encoder can be used 
to recover the original message even if it is corrupted by noise (but not 
too much). The mapping that the channel encoder performs receives 
the name of error-correcting code. 



Pe = 5^p(e)Pe(e) 

e 

= j;p(e)2-'"^^A.(e) 

e w=i 

= ^Me)Ai(e) 

e 

(1.28) 

Here = P{W ^ w\W = w} is the conditional probability of error 
given that message w was sent. A random choice of code C symmetrizes 
the probabilities. Thus we will only need to consider the error probability 
for one codeword. Consider the event: 



E^ = {{X"'{w),Y"')GTT} 

that is, both sequences are jointly typical. There are about 2"^^^^''^^ 
such pairs of sequences. The probability of error can then be expressed 
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as: 



Pe = P{Ei U U Ei} 



i=2 



< e+^P{E,} 



i=2 



2mH{X\Y)±e 

^ ^ 2^ OmH{X)±e 
1=2 

< e + (2"»-R - l)2"'(H{X\Y)-H{X)-2e) 

(1.29) 



El is the complementary event of Ei , and its probability vanishes 
as m grows. The second equality is obtained from joint typicality argu- 
ments: For a given output sequence Y"^, there are about 2™-'^("^l^) jointly 
typical input sequences X™. Since there are about 2"^^^^^ codewords, 
the probability that two different codewords are jointly typical with a re- 
ceived sequence is 2^™(^(^'^)='=2«. Thus, the error probability will tend to 
zero as long as R < I{X; Y) + 2e. 

Proof 8 (<^=Weak Converse) Once again, we will make use of Fano's 
inequality (see 1.26), but now the roles are somewhat interchanged: 

H{W\Y"') < mPeR + 1 = mem (1-30) 
Assuming that the messages W are equiprobable: 



mR = H{p^) 

= I{W;Y"') + HiWlY"") 
< I{X"';Y"')-^H{W\Y"') 
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= H{p'''^)-HiY'^\X'^) + mem 

m 

< 5^[if(p^0 - H{Yi\Xi)] + mem 

i=l 
m 

= Y,I{Xi;Yi) + mem 

1=1 

< mC + mem 

The first inequality comes from the fact that the information con- 
tained in Y"^ about W should be less or equal to the information that 
Y"^ contains about X'^ since X"^ is a function ofW. Second inequality 
comes from Fanno 's inequality. Third one comes from the independence 
bound. The fourth one comes from the definition of capacity (1.27), as 
the maximum attainable mutual information. So as m grows, the prob- 
ability of error goes to zero, mem 0; o-nd then we have that R < C 
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Chapter 2 



Quantum Mechanics as a 
Statistical Theory 

Physical Theories deal with observable features of Nature. For a theory 
to be accepted, it must be capable of predicting the outcomes of experi- 
ments and phenomena within its logical framework. Otherwise they are 
obliged to dwell the realm of mathematical games. This implies that 
any theory must account for measurements, that is, besides describing 
Nature, it must describe how we obtain knowledge from Nature. Since 
scientific theories rely on evidence for justification, this should be done 
on a statistical basis. Measurements are subject to statistical fluctu- 
ations, although several theories obviate this fact due to the invariant 
nature of their observations, such as astronomy. However, in general, 
any theory ought to include a complete statistical model that allows to 
infer system properties from measurement outcomes. 

A statistical model is a part of any theory, and it consists of: 

PrepEirations This refers to the states of the systems under consider- 
ation, like the setup of an experiment, which in classical theories 
are directly related to a point in phase space. 

Measurements Procedures by which physicists glean information about 
the systems from obtained data, which are obviously correlated to 
its state. 
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Mathematically this pair is denoted (6, 93t), where 6 is the set of all 
possible preparations and SPt is the set of all possible measurements on 
these preparations. 

There may be, and there usually is, uncertainty associated to both 
preparations and measurements. What makes Quantum Mechanics dif- 
ferent is indeed which these kinds of uncertainty are. 

Most contents of this chapter can be found in [6] [7] [8] [9] . 



2.1 Quantum Formalism 

In Quantum Mechanics a state is defined as an equivalence class of prepa- 
rations. This means that two states are to be considered equivalent if 
their preparations lead to parallel vectors in state space^ 

Quantum Mechanics arises classically as a probabilistic theory, due 
to a very fundamental property of sub-microscopic systems, known as 
the Superposition Principle, by virtue of which a quantum system may 
find itself in a complex linear combination of states. This is the hallmark 
of Quantum Mechanics. This property, together with the definition of 
state in previous paragraph, leads to a statistical model where the set of 
preparations is strongly convex, in contrast to classical statistical models, 
where they are just convex. 

The outcome of a measurement will depend probabilistically on the 
respective weights of the superposed states. This demands that exper- 
imenters be able of obtaining statistical ensembles of the same state in 
order to contrast experimental data with theoretic predictions. This 
automatically leads to two different (but closely interrelated) notions of 
probability. First one is related to the fundamental behavior of the sub- 
microscopic world, and second one (somewhat more classical) concerns 
the distribution of ensembles. 



'^Here, state space is a Hilbert space !K where vectors € 3f represent prepara- 
tions. Two vectors are equivalent if they are parallel, that is, if they are the same up 
to a proportionality constant. For this reason, at a basic level we will identify states 
with rays in Hilbert space, rather than vectors. 
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2.1.1 Set of States is Convex 

The need for both quantum uncertainties and classical ensembles is best 
met within the C*-Algebra formalism. A C*-Algebra £ is a Banach^ 
space with unit 1 and a *-involution such that: 

PSII < Pillion (2.1) 

P||2 = p*||2 = p^*|| (2.2) 

with A,A*,B<E(t. A* = A'' stands for the adjoint of A, meaning 
that the algebra is closed under the adjoint operation. 

Every C*- Algebra can be seen as a *-subalgcbra of the algebra of 
bounded operators on a Hilbert space J£, !B(CK) [10], so it inherits the 
inner product: 

{A, B) = Tr{A^ B) (2.3) 

Now consider the algebra 21 C 'B(Jf). A state g is a positive linear 
functional on this subalgebra, that maps elements in the positive cone 
21+ of 21 to nonnegative real numbers. We will only consider those 
functionals that fulfil ^(1) = 1, for reasons to become clear in a while. 

Let A G 21+ be a positive operator, then one can establish the one- 
to-one correspondence g{A) = Tr{gA), where g G 21+ is a positive, 

self-adjoint operator of trace one, called the density operator. The re- 
quirement that the operator have trace one is related to a probability 
normalization. We will subsequently identify g with g. 

Density operators will play an role analogous to probability distri- 
butions in classical probability. Whereas the a probability simplex ^„ 
has only n vertices, each corresponding to a distribution where all the 
probability mass is accumulated at just one outcome, density operators 
live in a strongly convex set, meaning that there is an infinite number of 
extremal points, as a consequence of the Superposition Principle. This is 

'^Loosely stated, a Banach space is a Hilbert space where orthogonality is not 
necessarily defined. 
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depicted in fig. 2.1, where the simplex for two classical outcomes is com- 
pared with the set of all possible quantum preparations of a two-states 
system. 

|o> 



SET OF SET OF 

CLASSICAL STATES QUANTUM STATES 



Figure 2.1: The set of quantum states is strongly convex: Classically, 
the set of states is given by the probability simplex any convex 
combination of the two attainable states, and 1, remains a state. The 
fundamental axiom of Quantum Mechanics, the Superposition Principle, 
says that it is possible for a quantum bit to be in state |0), in state |1), 
or in a complex linear combination of both. This leads to a set of states 
where every superposition of states (in fact, there are infinitely many) 
must still be contained in the set. The set of quantum states thus is 
strongly convex, since there are infinitely many extremal points (living 
in a finite dimensional space). For a quantum bit, this set is called 
Bloch's ball, and the coordinates of each extremal point can be worked 
out from the relative phases of the pure states. 

The set of quantum preparations in previous figure receives the name 
of Bloch's ball. Throughout this dissertation we will assume that |0) and 
|1) is our selected computational basis. This means that these vectors 
constitute a basis stable against decoherence and stand for the quantum 
counterpart of a bit. |0) and |1) can refer to the spin of a nucleus, the 
polarization of a photon, or to the state of a bistable atom. In any 
case, the number of degrees of freedom is two. Hence, it is the system's 
algebra that receives the name of qubit, as a shorthand for quantum bit. 
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In general, a N-states system is described by an algebra of N x N 
matrices and A''^ are needed to build a basis. One suitable basis is: 

i=l 

where {o'i}^^~^ is some basis of self-adjoint, trace-free matrices, such 
that: 

{(Tiaj) = adij (2.5) 
N 

ri = -{ai,g) (2.6) 
a 

So there is a mapping from density operators of dimension N to real 
vectors in R^^"-*^. For N = 2, this basis is the Pauli matrices and 
Tr{Q'^) = Tr{Q) = 1 if and only if ||r||2 = 1, i.e., rank one density 
operators lie in the boundary of Bloch's ball. A density operator having 
rank one is called a pure state, and otherwise is called a mixed state. Any 
mixed state can be expressed as a convex combination of pure states: 

Qmixed = ^ QjQj (2.7) 
j 

where l^^q = 1, and Qj = are rank one density operators. 

2.1.2 Set of Measurements is Also Convex 

A measurement M G 9Jl is an affine map from & to the set of all 
probability distributions in some probability space {U,Au,P^)' 

Classically, this can reflect the statistical bias of a measuring appara- 
tus or procedure, and amounts to a reshaping of the probability simplex. 
In the quantum world, a measurement is defined on a strongly convex 
set, where "quantum probabilities" live, and takes values in a classical 
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probability simplex, so forcedly some structure must be lost in a mea- 
surement process. This is sometimes called the wave-packet collapse, or 
decoherence (see section 2.3.1). 

Consider the measurement M(^) = p^. We shall write M = {M{ui), M{u2), 
M{uk)} where each M{uj) G 21+ is a positive operator, associated to 
an Uj in Ay^}^ . The probability of event Uj is given by: 



p{u^) = {e,M{u^)) (2.8) 

A measurement is also called a Positive Operator Valued Measure 
(POVM), since it relates a probability measure with an operator in the 
positive cone of the algebra 21. A POVM has the following properties: 



M(0) = (2.9) 
M(U) = 1 (2.10) 
Ui C Uj =^ M{ui) < M{uj) (2.11) 

Ui = [juj ^ M{ui) = '^M{uj) (2.12) 

j j 

Since it is required that the whole sample space be covered, i.e., 
Uj = IX, then: 

k 

^M{uj) = l (2.13) 

which ensures that probability is normalized p{U) = Tr^gl) = 1. 
The M{uj) constitute a resolution of the identity. 

Note that, even if Uif]uj = 0, in general it still may be the case 
that M{ui)M{uj) / SijM{uj). In this case we have a non-orthogonal 
resolution of the identity, also known as a fuzzy measurement. 

Whenever Ui(~]uj = fj) ^ M{ui)M{uj) = 5ijM{uj) holds, we have 
a projective measurement. This is justified because M{uiY — -^(^i) 
are projectors. Projective measurements are extremal points of StJt, and 

'^These events need not be elementary events: as members of the cr-algebra Au, 
they may in general be subsets of the sample space U 
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are also called von Neumann measurements. The converse is in general 
not true: there can be non-orthogonal resolutions of the identity at 
the extremal points of Tt. However, for qubits, where U = {0, 1}, the 
converse is true [6] and one can say that a measurement is an extremal 
point of dJl if and only if it is a projective measurement. 

Observables are directly related to projective measurements through 
the Spectral Theorem, which says that any self-adjoint operator X admits 
the spectral representation: 



where M{ui) is an orthogonal resolution of the identity. Ui constitute 
the spectrum of the observable and M{ui) determine the eigenspace 
associated to each eigenvalue. 

In a practical scope, it is not known how to implement a general non- 
orthogonal POVM, defined in a state space "Ki. However, Neumark's 
Theorem [6] ensures that it is possible to simulate a POVM with a 
projective measurement defined in an extended space 'Ki (8) 'Ka- The 
letter "A" stands for ancilliary system. 



2.2 Von Neumann's Entropy 

Just as classical entropy is defined on a probability simplex, it is possible 
to define an entropy for quantum probability distributions, called the von 
Neumann's entropy, which is defined on the set of quantum states: 



In the case of orthogonal states, it reduces to Shannon's entropy. In 
fact, many properties (but not all) of classical entropy still hold in the 
quantum case. We can derive them, as in previous chapter, from the 
more fundamental quantum relative entropy: 




(2.14) 



S{g) = -Tr{Q\ogg) 



(2.15) 



S{g\\cr) = Tr{glogQ- gloga) 



(2.16) 
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Theorem 6 [Nonnegativity of quantum relative entropyjThe quantum 
relative entropy is positive semidefinite, S{Q\\a) > 

Proof 9 It is possible to find a diagonalization for each density operator, 
Q = lliPi\^){A and a = Y,j qj\j){j\, then: 



SiQlW) = Y.P^[logp,-^logqj\\{i\j)f] 

i 3 
= [log Pi - X] 

> ^ Pi [log Pi - log ^ A,j qj] 



Pi 



= "^Pdoi 

i 

> 



Pi 

ri 



The diagonalizations need not be equal, thus the possible overlap be- 
tween the states must be accounted for. This overlap is encoded in a 
doubly stochastic matrix Dij = \\{i\j)\\'^ > 0. The first inequality comes 
from a slight variation of Jensen's inequality and the concavity of the log- 
arithm. The last inequality is a just a property of the classical relative 
entropy (see Theorem 1). 



Theorem 7 [Subadditivity of von Neumann's Entropy] For a global sys- 
tem the joint entropy satisfies S{q'^^) < S{g-^)-\-S{0^) with equality 
if and only if both systems are uncorrelated. 

Proof 10 As a consequence of the nonnegativity of the quantum relative 
entropy, we can write: 



D{Q\\a) = -S{g) - Tr{gloga) > 
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Taking g = g^^ and a = g^ g^ , we obtain: 

= -g^^'ilogg^ + logg^) 
= -g^logg^ - g^logg^ 
= S{g^) + S{g'') (2.17) 

To see that the bound is tight if and only if the g^^ = g"^ ® g^ , one 
need only consider the relative entropy S{g^^\\g^ ® g^). 

Theorem 8 [Concavity of von Neumann's entropy] Von Neumann's en- 
tropy is a concave function of g 

Proof 11 To prove this result, we will make use of a spurious system 
B. Consider the joint state: 

i 

Its von Neumann's entropy is: 

= S{Y,P^0t'^\i){i\'') 

i 

= siY.p^iY.xi\j){j\^)c^\i){if) 

i j 

= -"^PiH log -^i ~ X] P'-^i Pi 
i,j i,j 

= -^PiH log K -"^Pi log Pi 

i,j i 

= Y.P^S{gt) + H{p) 

i 
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where J2j 1^)01"^ *^ ^ diagonalization of gf. Note that \j = 1. 
On the other hand, the entropies of the separated systems are: 

i 

s{^pm\^) = H{p) 

i 

Making use of the subadditivity property proved before, we have that: 

i i 
i i 

thus, the von Neumann's entropy is convex. 

2.3 Classical Information and Quantum Infor- 
mation are Not the Same 

The difference between bits and qubits is more fundamental tlian just 
terminology. Whereas classical bits are symbolic representations of the 
information stored in a physical system (i.e. modulated waves, or the 
orientation of the magnetic cells in a hard drive...), qubits arc to be 
identified with physical systems, or with their algebra at least. Quan- 
tum information is more general than classical information, since the 
symbolic representation of information arises in the special case where 
only orthogonal states are considered. 

2.3.1 Clcissical Information through Decoherence 

Since Quantum Mechanics supersedes classical theories, it is expected 
that classical probability can as well be represented in the language 
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of density operators. Consider a probability distribution p''^ over X 
(||X|| = n), then its operator counterpart can be written as: 

n 

g"" = J2Pi\i){i\ (2.18) 

i=l 

This density operator belongs to the algebra of diagonal matrices. In 
the qubit case, this algebra is the set of density operators in the segment 
that passes through the poles in Bloch's ball (sec fig. 2.1). 

Quantum states can be described by density operators having off- 
diagonal terms, which are responsible for quantum interferences, and 
this is directly related to the fact that the set of states is strongly con- 
vex. How classical properties arise from quantum-mechanical laws is a 
itself a topic of intense research and receives the name of Environmen- 
tal Decoherence [11] [12]. In the information-theoretic context of this 
thesis, it suffices to say that in the measuring process that both the 
measured system and the measuring apparatus evolve together in time 
(according to some interaction Hamiltonian) into a preferred diagonal 
basis, induced by the interaction of the measuring apparatus and their 
environment [13]. 

Let = '^jQjQj, Qq and g^ be the initial states of a system, an 
apparatus and their environment, respectively. In a first step the system 
and the apparatus become correlated, so that observing the apparatus 
will give us information about the system. In a second step, the appa- 
ratus is let alone to evolve along with its environment. This process is 
depicted in fig. 2.2: 

Vsa{J2 ^jSj ® QoMa = E ^'jSj ® 4 ^ (2-19) 
j j 



^ ^AEiYl q'j Qj ® gf (^Qo)1^\e = T. «j ® 4 ® sf (2.20) 
j j 

The measuring apparatus and the environment become rapidly cor- 
related, and the off-diagonal terms in the system's density operator are 
swept away. Provided that the environment remains in a pure state and 
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that {gfof) ~ Sij^ , tracing out the environment and the apparatus (see 
section 3.1.1) leaves us with a classical probability simplex: 



Q 



SA 



(2.21) 



i 



ENVIRONMENT 




MEASURING 
APPARATUS 




10 



11 



SET OF 
QUANTUM STATES 



PROBABILITY 
SIMPLEX 



Figure 2.2: Decoherence in the measuring process: The interaction of 
the measuring apparatus with its environment causes the quantum cor- 
relations to dilute in the joint Hilbert space of the system, apparatus 
and environment. The crux of this process is that the states of the en- 
vironment are by definition unaccessible, so no measurement could ever 
detect these correlations. Thus, locally, the joint system-apparatus state 
appears to be in a diagonal matrix state, as a consequence of tracing out 
the environment. In this picture, a measurement with four outcomes is 
represented. 

such that Uae] = 0, i.e. they share a diagonal basis^. Eq. 
2.21 is to be compared with eq. 2.18 The quantum probabilities don't 
disappear, but get dispersed in the correlations between the system and 
its environment. 

Note that the expressions 2.19 and 2.20 are not correct in general, 
since time evolution couples the system, the apparatus and their envi- 

^This assumption basically comes from the fact that we don't observe quantum 
interferences between macroscopic states 

^This means that the measuring apparatus evolves to a state which is stable 
against decoherence, i.e., stationary under macroscopic time evolution 
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ronment in such a way that their global state cannot be expressed as 
a product state: the final state may in general be entangled. However, 
to illustrate the correlations that take place during the measuring pro- 
cess we use these partially "allegoric" expressions. In later chapters the 
correctness will be restored. 

2.3.2 No-Cloning Theorem 

Another way to see the difference between classical and quantum in- 
formation is to imagine a machine capable of copying quantum states. 
The machine is fed at its input with an unknown quantum state and it 
outputs two copies of the initial state. This machine cannot exist: 

Theorem 9 [No-Cloning Theorem] It is impossible to copy unknown 
quantum states. 

Proof 12 Without loss of generality, we shall only consider pure states. 
Consider two unknown states Qi and Q2, which are fed as input into the 
copying machine in an initial pure state Q^^^ . The copying process is 
described as a time evolution U of the whole system: 

U(^i ® ^^^)U'^ = Qi®Qi (2.22) 

U(^2 ^ Q^^')V^ = Q2®Q2 (2.23) 
if we now take the inner product of the two equations, we have that: 

{qi,Q2) = {quQ2? (2.24) 

thus both states must be either orthogonal, or the same. This re- 
quirement is in contradiction with the assumption of the two qubits being 
unknown. 
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As expected, this is in accordance with the existence of classical fan- 
out gates, which have the capacity of copying a bit as many times as 
desired. 

2.3.3 The Holevo's Bound 

Classically, the capacity of inference is related to the mutual information. 
An observer (receiver) can reliably guess the value of an experiment 
(or channel use) provided that I{X;Y) is arbitrary close to H{X). In 
principle, thanks to the use of better preparation and measuring devices 
(equivalently, coding and decoding schemes), mutual information can be 
brought very close to its upper bound. 

Quantum mechanics prevents this fact, once again as a consequence 
of the Superposition Principle, because there may be states which are 
not orthogonal, and no measurement can, even in principle, distinguish 
them with 100% reliability. 

Theorem 10 [The Holevo's Bound] Let X £ X be encoded in state 
= Yl'i=iPf Qi' where the Qi have orthogonal support, and a measure- 
ment 'M.y{q^) = p^, the accessible information is upper bounded by: 

n 

I{X:Y)<S{e'')-Y.Pf^^Q^) (2.25) 

1=1 

Proof 13 Mutual information can be written as: 

I{X : Y) = H{p^) - H{p^\Y) 

Last term represents the uncertainty about X provided that measure- 
ment My was chosen: 

m n 

-f^(p^l^)) = ^^PiVj) '^p{xi\yj)\ogp{xi\yj) 
j=i i=i 

34 



2.4 Experiments as Information Transfer 



with p{xi\yj) = {gi, Mj). It is easy to see that the conditional entropy 
will vanish if and only if {Qi,Mj) = Sij. Now, suppose that this is 
indeed the case: selected measurement scheme is optimal. Reasoning in 
a similar way to Theorem 7, it is possible to write: 

n n 

i=l i=l 
The optimal measurement strategy yields: 

n n 

I{X : Y) = iJ(p^) = Si^pfg^) - ^pfSig^ 

i=l i=l 

For measurements that are not optimal, we will have in general that: 

n n 

I{X:Y)<S{^pfg,)-^pfS{ei) 
1=1 1=1 

Note if the states gi are chosen to be pure, the upper bound in 2.25 
reduces to the classical entropy. One direct conclusion to be drawn from 
previous Theorem is that the information contained in a qubit is, at 
most, one bit. This discouraging result may lead us to the opinion that 
quantum information has no real advantages over classical information. 
As we will see in next part, this belief is wrong. 

2.4 Experiments as Information Transfer 

Perhaps it is illuminating to see that it is possible, just with a slight 
change in the terminology, to compare the two main scenarios that are 
occupying us: a communications channel, and a physical experiment. 

The main goal of both operations is to gain information about the 
state of an unaccessiblc system. In a communications channel this sys- 
tem is the source. In the experiment, it is an unknown system that 
is forced to interact with another one, previously prepared in a quan- 
tum state. Thus in the experiment scenario, information enters at the 
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EXPERIMENT 
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Figure 2.3: Pictorial duality between experiments and channels: In the 
communication channel scenario, we represent a source taking values 
in two symbols, encoded in a four symbol code. Then, stochastic evo- 
lution takes place and the probability simplex is distorted (dimension 
can increase, decrease or remain the same, and symbol frequencies may 
change). At the decoder, the original two symbols should be recovered 
with their original frequencies. In the case of experiments, a quantum 
state is prepared. According to a known Hamiltonian, the system un- 
dergoes a deterministic time evolution jointly with the unknown system 
of which information is to be obtained. Then several hypothesis are 
encoded in the set of states, as a result of the joint time evolution. 



evolution stage. Encoding is a procedure analogous to a preparation. 
Whereas temporal evolution is totally deterministic, the channel is of 
stochastic nature, but it still represents some sort of time flow. After 
undergoing a temporal evolution or a channel, the sets of states are dis- 
torted to some extent. Finally, both measurement and decoding entail 
the estimation of a probability distribution out of the incoming sets of 
states. In the case of experiments, uncertainty is introduced at this 
stage, if the states are not orthogonal. 
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Chapter 3 

Quantum Non-locality 



To jointly describe multiple systems, an algebra is needed which contains 
the elements of global measurements, as well as those part of partial 
(marginal) measurements. For bipartite systems, the algebra of the 
composite system is: 

where 2t and ?B are the operator algebras of the subsystems, respec- 
tively. For the algebra of diagonal matrices of dimension A^, that is, for 
classical distributions of N different outcomes, the number of orthogonal 
matrices needed to form a basis is N. For two classical systems of the 
same dimension, we will need N'^ such matrices. In the quantum case, 
since matrices need not be diagonal, the number of matrices that form 
a basis is N^, and for two systems iV^ matrices would be needed. 

For classical systems, if two different observers perform a measure- 
ment, each one at a different system, they will each gain logA'^ bits of 
information. If they combine their information about the subsystems, 
it will be possible to reconstruct the global state, as it only demands 
21ogiV bits. 

In the quantum case, the Holevo's bound says that each observer can 
gain at most log bits. Thus, there is no way no learn about the global 
state just from the marginal measurements, for it demands 41og-/V bits, 
whereas there are only 2 log N available. 
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This suggests that there is more information contained in the com- 
posite system, than in the sum of the informations contained in its com- 
ponents. This characteristic of quantum systems gives rise to a new 
fundamental phenomenon called quantum non-locality. 

3.1 EPR Paradox 

In their famous paper [14], Einstein, Podolsky and Rosen (EPR) came to 
the conclusion that Quantum Mechanics was in awkward epistemological 

status, due to the its lack of at least one of the properties required to any 
theoretical framework which intends to describe Reality. This properties 
can be stated as two principles: 

Principle of Locality Two causally disconnected, i.e. spatially sepa- 
rated, measurements cannot exert any influence on one another. 

Principle of Realism Any physical theory must account for every el- 
ement of reality, this meaning that every possible outcome of an 
experiment should have a definite value prior to its measurement. 

EPR showed that Quantum Mechanics violates at least one of these 
two principles, so a quantum description of Reality cannot be completely 
accurate. In a gedanktexperiment devised by Bohm [15], which involves 
particles of spin one half. 

Consider the pure state of a composite system g^^ G 21(8) Q^^ = 
where: 

m = ^m^<^\if-\i)^®\of) (3.1) 

is a state vector representing the preparation of two two-states sys- 
tems. This state may be created as pairs of photons of opposite polar- 
ization emitted from a common source^. The indices A and B denote 
two different locations or observers, causally disconnected, where each 
operator algebra is defined, respectively. 

'^Photons are massless spin one particles, so their polarization has just two degrees 
of freedom, and can be modeled as a two-state system. 
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Given that the source is capable of providing an unlimited number 
of pairs, observer at location A performs a (projective) measurements 
on its particles M"^(^^) = p"^, and so does the observer at location B, 
obtaining the distribution Mf (£>^) = p^,i = 1,2. Observer at location 
B is able to choose between the different measurements: 



and M"^ = Mj^. The two measurements correspond to different 
orientations of the polarized detector. From basic Quantum Mechanics 
it is not hard to see that these orientations arc orthogonal in real space. 

If A and B choose to use the same measurement setup (detectors 
polarized in the same direction), due to the structure of the state g^^, 
whenever A measures its particle pointing upwards, B will necessarily 
find it pointing downwards, and viceversa. If B uses a detector polarized 
in an orthogonal direction, then its outcomes will be uncorrelated to 
those of A, which comes from the fact that: 



{Mf, Mi^i) = -{Mf, M2^2) = {Mi, M^^i) = -{Mi, M^^^) = - (3.4) 

that is, there will always be some probability overlap between the 
outcomes in the different orientations. 

The EPR paradox can be stated as follows. Suppose that at a first 
stage, A and B are measuring their respective particles in different di- 
rections, i.e. using different measurement setups. Their statistics will be 

plain, = ~ ( ^ ) ^^^^ 3.1.1). Now, suppose that, right before 
measuring its particles, B always switches to its alternative setup with- 
out letting A know about this change (they might be many lightyears 
apart), and measures in the same direction as A does. No matter how 
far apart they happen to be, if A gets |0), then B will get |1) with cer- 
tainty. Their outcomes will be correlated, yet they will not be aware of 
this correlation unless they communicate their results, for their statistics 
will remain plain. 
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Figure 3.1: EPR paradox: Each subsystem of the EPR pair is dehvered 
to a different observer. A and B are spatially separated, so the choice 
of observer B should not influence the outcomes of A. 

How does the particle at A learn about the change in the orientation 
of detector in B, despite being causally disconnected from it, is the EPR 
paradox. One must draw the conclusion that either: 

• Quantum Mechanics violates the Principle of Locality, or 

• Quantum Mechanics is incomplete and some hidden-variable the- 
ory that supersedes Quantum Mechanics is needed to explain these 
non-classical correlations. 

3.1.1 Marginal Measurements 

So far we mentioned g"^^ , and g^ as the density operators of the 
whole system and of its components, respectively. The procedure to 
obtain the marginal density operators from the joint one is analogous as 
in classical probability. Let p"^-^ = {pfj^}^^ j=i be the joint probability 
for variables A and B^. The marginal in A is obtained via: 

m 

pt = Y.Pff (3-5) 

i=i 

^Here we don't assume that A and B are systems with the same dimension, so n 
and m need not be equal 
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To "sum out" one probability distribution is equivalent to ignore 
what is happening in the system associated to B. In Quantum Mechanics, 
this deliberate ignorance amounts to perform the trivial measurement, 
Mb = {1}, in the system we want to ignore: 

= (/^,M/i®l) (3.6) 

If with Qa^a'b' ^® denote the entries of g^^ corresponding to its 
subsystems, where aa' represents the degrees of freedom localized at A 
and bb' those localized at B, then eq. 3.5 can be developed: 

n,m 

aa',bb' 

n rn 

= y^,(y^^ Sa^a'b)^Uia' = 

aa' b 
n 

= Y.eaa'Mta' = {Q^.Mt) (3.7) 
aa' 

where q"^ = Qa^a'b so-called reduced density operator, 

obtained by disregarding system B. The operation of tracing out one 
of the subsystems is called partial trace of a state, and is denoted 

= Treg^^. 

Thus, we have seen that the subalgebras of marginal measurements 
can be obtained just by means of tensor-multiplying positive operators 
with the identity. 

3.2 Quantum Correlations and Bell's Inequali- 
ties 

EPR agreed on that the predictions of Quantum Mechanics were indeed 
correct, but ultimately explainable in terms of statistical distributions of 
some "hidden variables" , which would be in harmony with the principles 
of locality and realism. This conjecture could neither be proved nor re- 
futed until the advent of Bell's inequalities[16]. These inequalities have. 
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a priori, nothing to do with Quantum Mechanics, but rather put a con- 
straint on the correlations predictable by any theory that incorporates 
"local realism" (we use this name to refer to both principles introduced 
above) . 

A bipartite system is said to be correlated if: 



pAB ^ pApB ^ ^^AB^ j^A ^ j^B^ ^ ^^A ^ ^A^ ^^B^ (3_g^ 



for some measurements M"^ and M^. Here the inner product has to 
be understood as componentwise products. Equation 3.8 is equivalent 
to demand that the density operator factorize: 



A quantum state may exhibit two kinds of correlations: classical and 
quantum. Classical correlations arise whenever a state is of the form: 



with l-^q = 1 . It is straightforward to check that expectation values 
no longer factorize for these states. These states arc known as separable 
states. Quantum correlations are, once again, a consequence of the Su- 
perposition Principle applied to composite systems: a (pure) quantum- 
correlated state doesn't admit a convex decomposition as in the previous 
expression, yet it still fulfils condition expressed in eq. 3.9. One example 
is the state used in the EPR paradox: 



= = ^(|0)|1) - |1)|0))((0|(1| - (1|(0|) ^e^^g^ (3.11) 



Such states are called entangled. Ascertain whether it was possible 
or not to describe entangled states in the context of a hidden variable 
theory was the task of Bell's inequalities. 

The assumption of local realism entails the existence of joint proba- 
bility distributions of a set of measurable quantities, regardless of whether 



(3.9) 




(3.10) 



k 
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they are observed or not. Any system will be at a definite state prior to 
being measured, which implies that the correlations between measure- 
ments at two different locations may depend on any (in general infinite) 
number of hidden variables A G A (with A a continuous set): 



C{i,j) = (/^,M^®Mf ) = / /(M^,A)/(Af/,A)p(A)dA (3.12) 

J A 

where p(A) is a probability distribution, and f{Mf',X) is the prob- 
ability of measuring outcome i in system A, when the unknown hidden 
parameter is A. 

In an experiment proposed by Clauser, Horne, Shimony and Holt 
(CHSH) [17] and carried out by Aspect and coworkers [18], it was possi- 
ble to test whether entangled states admit a hidden-variable model (the 
CHSH inequality applies to two-states systems, inequalities for general 
systems have also been found. As an interesting case see [19]). Consider 
the measurements Mj^, Mj^, Mf{9), and M^(0). These measurements 
have a probability overlap which depends on the angle between the 
two different setups (see fig. 3.2). They derived the following inequality, 
which holds for every theory that incorporates local realism: 



2 < (p^^,Mi^,®(M/^/0)+M2^/e))+M2^,(0)®(Mi^^.(e)-M2^^.(0))) < 2,Vi, j 

(3.13) 



M- 



A 



M^((9) 



Mf (0) 



9 



M 



A 



Figure 3.2: Violation of Bell's inequalities 

Quantum Mechanics predicts a violation of this inequality for some 
states. For the state g^^, the violation is maximum foi 9 = f , where 
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we have that \2\/2\ ^ 2. The conchision is that EPR were in the right 
path: Quantum Mechanics is non-local. 



3.3 Information-Theoretic Considerations 

Von Neumann's Entropy is a generalization of Shannon's Entropy. It 
is zero for pure states, i.e., rank one density operators. In the case 
of bipartite systems, although the global system may be known with 
certainty to be in a pure state, such as for , its marginal states, 
described by the reduced density operators q"^ = TtbQ^^ and = 
TrAQ^^, can be in a mixed state, so that their von Neumann entropy will 
be nonzero. This, once again, suggest that the whole system contains 
more information than the mere sum of the information contained in it 
parts. 

As we will see, a pure state is entangled if and only if the von Neu- 
mann's entropy of any of its reduced density operators is nonzero. 

One consequence of the above said is that conditional quantum in- 
formation can be negative [20]. For pure entangled quantum states we 
have that S{g'^^ = 0), so that: 

s{e^^) = s{e^) + s{e''\A) (3.i4) 



Sig^'lA) = -S{q^) <0 (3.15) 

This "negative conditional information" , with no counterpart in clas- 
sical Information Theory, can be given an operational meaningful inter- 
pretation. If S{g^\A) is negative, then A can reproduce the whole state 
^AB j^g^ j^ga,ns of classical communication, which is equivalent to say 
that quantum information can be transferred from B to A using only 
classical bits [21]. Depending on its sign, conditional quantum infor- 
mation is the rate at which entanglement is created or consumed while 
transferring the state of be to state in A, and it is related to the quantum 
capacity of a channel. 
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3.4 Unexpected Applications 

Given a total state that comprises locations A and B, an observer at 
location A can, by performing a local POVM on its subsystem, exert 
an influence upon the outcomes of observations at B (with A and B 
being spatially separated). This fact was named quantum steering by 
Schrodinger [22] [23]. It was shown [24] that a local POVM at A can 
induce any ensemble {pk, Qk} at B provided that the reduced density 
operator at B admits a convex decomposition of the form = YlkPkQk- 
If otherwise one could actually change the marginal statistics, superlu- 
minical communication would be achieved. 



3.4.1 Quantum Teleportation 

One of the brightest consequences of quantum steering is quantum tele- 
portation. Provided that observers at A and B share an EPR pair g^^, 
it is possible for them to teleport an unknown qubit^ in a pure state 
ip'-'\ initially at location A, to B using just local operations and classical 
communication (LOGO). 



Figure 3.3: Quantum Teleportation: A quantum system in state ip dis- 
appears at location A, and after some classical information has been 
sent from A to B, the system ip appears at location B. A and B are 
causally disconnected. Despite its name, it is a rather prosaic effect, 
since it involves a protocol prescribed beforehand and requires that A 
and B share an EPR pair. 

^In general any qudit can be teleported. In this subsection and next one we will 
follow the original path of its discoverers and use qubits 
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Initially, the global system will be described by the state g^^ (gi (p'-^ . 
Observer at A chooses a projective measurement M = {Mj}| G QJl"^ (8) 
DJl*^ whose components are rank one operators such that: 

(M„4f) = 5,,- (3.16) 
where g^^ = \tl^j){i'j\ is any of the four states forming a Bell's basis: 

|V'i) = i=(|0)^®|0)^+|l)^®|l)^) 

\^2) = ^{\0)^®\0f-\l)^®\lf) 

IV'3) = ^(|0)^®|i)^ + |i)^®|0)^) 
IV'4) = ^(|0)^®|i)^-|i)^®|0)^) 

The EPR pair corresponds to the fourth vector of this basis. Once 
this basis has been selected, as long as the unknown system is in a pure 
state, it is possible to rewrite the initial state in the form [25]: 



4^®<^^ = \[gif^v'^+g^^mUv^) + gi^^Rliv^)+g^f^R;iv^)] 

(3.17) 

Here R^{ip^) denotes a rotation of the state cp^ of 180 degrees around 
axis k. 

To teleport the unknown system, A performs the projective mea- 
surement in both systems A and C, so that its outcome determines with 
certainty in which state will the system in B is. Now, all A has to do 
is to encode its outcome in two bits and send them over to B. Once 
B receives the information, it will be possible to rotate the state back 
in the direction determined by the two bits. With 100% accuracy, the 
initial unknown state will be obtained at a spatially separated location. 

For mixed states, and for non-orthogonal POVM, it is still possible 
to teleport a system, but the process will be necessarily less efficient. 
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3.4.2 Quantum Superdense Coding 

Another outstanding feat of quantum information is superdense coding 
[26], by wliicli A can send to B two bits of information encoded in a 
single qubit^. 



Figure 3.4: Superdense Coding: Provided that A and B share an EPR 
pair, A can change the global EPR state acting locally on its subsys- 
tem. Then A sends its part (p = TtbQ^^ of the EPR state to B, who, 
measuring on a joint basis of the two subsystems, can extract two bits 
of information. 

Suppose that A and B share an EPR pair Q^f ■ Observer at A causes 
its subsystem to evolve into one of the four possible states: 

(Uf ® I^)(4f )(Uf =^ 4f (3.18) 
The four possible operations that A can apply to the joint system 

are: 




where the cj's are Pauli matrices. After the manipulation, the sys- 
tem initially at A is sent to B. Observer at B chooses the projective 

^In fact, 21ogn bits can be sent using n-states systems 
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measurement M = {Mi}f G O with {Mi, g^^) = 5ij. Thus B 
win gain two bits of information, while it only received one qubit. 

Note that, whereas in quantum teleportation an unknown state was 
measured in A and the outcomes were encoded in classical bits, in su- 
perdense coding, a the "future outcomes" of a measurement in B are 
encoded in a known qubit, which is later sent from A to B. First scenario 
is related to the capacity of transmitting quantum information through 
a classical channel, and second one is related to the capacity of transmit- 
ting classical information trough a quantum channel. Underlying these 
two processes lies the same phenomenon: Quantum Non-locality. 
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Chapter 4 

Entanglement Theory 



Entanglement is a new resource for communication that lies at the very 
heart of lies at the very heart of Quantum Information Theory. Thus, a 
theory of Entanglement which offers a qualitative description as well as 
quantitative measures is highly desirable. 

Mainly, two difficulties surround this task. First one is to find a 
meaningful measure of the entanglement contained in a state. One way 
to obtain a suitable measure is to define beforehand some desirable prop- 
erties that it should have: 

Scope An entanglement measure is a map from the set of composite 
density operators to the positive real line: 

E{g^^) e R+ (4.1) 

Normalization It should vanish only for separable states, and should 
be maximum for maximally entangled states: 

< Eie^") < E{e^f) (4.2) 

Monotonicity E{g'^^) should not increase under transformations in- 
volving only local operations and classical communication (LOCC). 
Consider the set of all LOCC transformations Ilocc- For any 
transformation T G 7locc' 
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£;(T(/^)) < ^(/^) (4.3) 

As a consequence, one can derive the requirement of invariance 
under local unitary evolution. Let q"^^ = {U ® V)q^^{U ® V)\ 
for any two unitary operators, then: 

E{{U ® y)/^(C/ (8) < E{q^^) (4.4) 

E{{U®V)^q^^{U®V))<E{q^^) (4.5) 

whence we obtain E{q^^) = E{{U (g) V)g^^{U F)t). 

Convexity Such a desirable property arises naturally from the reason- 
able assumption that mixing two states should not increase the 
entanglement contained in them: 

£;(a/^ + (1 - A)(^^^)) < A£;(/^) + (1 - A)£;(^^^) (4.6) 

with < A < 1. 

Continuity Intuitively, if g'^^ is slightly perturbed into ip^^, the sub- 
sequent change in the entanglement measure should be small. This 
is expressed as: 

lirn E{g^^)-E{^^^) = (4.7) 

\\g-<p\\^0 

Subadditivity The communication tasks that one is able to perform 
in the possession of several entangled pairs shouldn't be more than 
the sum of those permitted individually by each pair: 

E{g^^ ® ip^^) < Eig^"") + E{ip^^) (4.8) 

for the case when one has many copies of the same state, the 
demand which is often encountered is thatof weak subadditivity: 

E((fi®^) 

= E{^) (4.9) 

The second difficulty is how to compute an entanglement measure 
for any given state, which as we will see, is a far from trivial task. 
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4.1 Pure States 

For pure states, a satisfactory theory exists and there are procedures 
both to detect and quantify the entanglement of a given state. The 
von Neumann's entropy of any of the reduced density operators satisfies 
the requirements exposed above [27] [28] and is directly related to the 
Schimdt's decomposition^ of the vector state. For a maximally entangled 
symmetric state in dimension A^, its Schmidt's decomposition is: 

IV'i)^'' = 7^El^)^^l^)'' (4.10) 

^■'■^ i.=0 

and the von Neumann's entropy of its reduced density operator at- 
tains its maximum: 

SiTrBQi^)= log N (4.11) 

If the state is separable, it will necessarily consist of two pure states 
at each location, so the entropy of any of the local density operators will 
be zero. 

4.1.1 Entanglement Distillation 

The preference of von Neumann's entropy of the reduced density oper- 
ator over other candidates relies also on another reason: it quantifies 
the amount of maximally entangled states that one can obtain from an 
arbitrary large number of arbitrary density operators, by some LOCC 
transformation [29]. 

Theorem 11 [Entanglement Distillation] Given m identical copies of 
one arbitrarily entangled pure state, (v?^^)®", then there exist a LOCC 
scheme T G 7locc such that it is possible to obtain n < m copies of a 
maximally entangled state: 

^The Schmidt's decomposition of a bipartite state is the projection of the state 
onto an orthonormal product basis of the two Hilbert spaces. If a state is separable, 
its vector state will be pointing parallel to one of the vectors of this basis. If it is 
entangled, its vector state will have more than one component. 
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lim ||T(((/?^^)®'")-(^if)^"|| =0 (4.12) 

and the rate at which this can he done is given by the von Neumann's 
entropy of any of the reduced density operators of the initial state: 

R= lim - = S(TrB<f^^) (4.13) 
m-*oo m 

Our proof for this theorem will need typicality arguments, so it will 
be given in next chapter. 

The importance of entanglement distillation is that most communica- 
tion tasks rely on maximally entangled states in order to yield acceptable 
transmission fidelities (sec tclcportation and superdense coding). Due 
to the stochastic influence of channels on quantum information, it is 
very difficult to prevent a transmitted entangled particle from being 
corrupted with some noise, and protocols must be devised to restore the 
initial entanglement, at the expense of sending more particles. 

The converse procedure is called entanglement dilution, by which n 
copies of the maximally entangled state can be used to obtain m > n 
copies of an arbitrarily entangled pure state. However it doesn't seem 
to be as practical as entanglement distillation. 



4.2 Mixed States 

For the general case of mixed states, the theory of entanglement is far 
from complete. There are several several measures, based on inequiv- 
alcnt criteria. The prevalence of any of this candidates has not yet 
occurred. Our approach to the study of entanglement focuses on two 
distinct areas, detection and quantification of entanglement. These two 
concepts are tightly interrelated, and this scheme is just a matter of 
taste. 

It was shown that the separability problem, i.e. to ascertain whether 
a given state is separable or entangled, belongs to the class of NP-hard 
problems [30]. This means that, on a realistic basis, we should not 
expect to measure (and detect) entanglement with arbitrary accuracy. 
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Several relaxations of the separability problem have been proposed in 
the context of convex programming [31] [32] [33] [34]. Here we will study 
more in depth the approach suggested in [35] , which a offers a geometric 
intuition of the space of composite density operators. 

For a thorough review of this concepts readers might check [36] [37] . 
4.2.1 Detection 

Several criteria have been proposed to check whether a state exhibits 
quantum or just classical correlations. Here we shall list some of them: 

Peres- Horodecki Criterion Peres showed that Positivity under Par- 
tial Transposition (PPT) of the density operator was a necessary 
condition for separability [38]. A bipartite state q'^^ is separable 
if: 

i,^^f'^ = Y,st^{9^f>0 (4.14) 
k 

that is, if it remains positive semidefinite under transposition of 
just one local density operators. The Horodecki traced back this 
argument to the theory of positive maps [39], and demonstrated 
that for systems of dimension 2x2 (two entangled qubits)and 
2x3 (one qubit entangled with one qutrit) this criterion is also 
a sufficient condition. Why the separability problem is solved, 
despite being NP-hard in general, finds an explanation in the fact 
that, for 2x2 and 2x3 systems, any positive map is of the 
form L = Si + §2 o T (§1,2 are completely positive maps and T 
is the transposition map), so no further search is needed to fully 
characterize separable states.^ 

Majorization Another necessary condition for separability is the Ma- 
jorization Criterion, which although being less effective in detect- 
ing entanglement than the PPT criterion, reveals a thermodynamic 

^Positive maps are those maps for which h{g) > 0,V£i. A positive map L is 
completely positive if and only if (I„ ® L)(j>) > 0,V£', n. For classification, only 
positive maps are interesting. It is easy to see that the PPT criterion relies on a 
positive map. 
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aspect of non-locality. Consider the composite system q and its 
reduced density operator g^. Denote by X{(p) the vector of the 
eigenvalues of ip, arranged in decreasing order. ^"^^ is separable if: 

A(/^) -< Xie^) (4.15) 

where -< is a pre-order relation meaning that the vector Xig"^) — 
Xig"^^) lies inside some positive cone so that S{g^^) > S{g^) 
[40] [41]. This, in turn, implies that if the state is entangled, then 
S{g'^^) < S{g^), so once again, we see that Quantum Mechanics 
allows information to be stored in a composite system in a holistic 
manner, regardless of its parts. 

Entanglement Witnesses In [39], the Horodecki introduced Entan- 
glement Witnesses (EW). An EW is a Hermitian operator W = 
such that: 

{W,a) > 0, for ah separable a (4.16) 
{W,g) < 0, for some entangled g (4.17) 

which is a consequence of the Hahn-Banach Theorem [42] in func- 
tional analysis, since EW are directly related to positive maps [43] 
defined on a Banach space. This is valid for algebras of arbitrary 
dimension, so it will prove to be a very useful concept. 

As stated before, there is a duality between positive maps and oper- 
ators. In low dimensions (2x2 and 2x3), any positive map admits the 
decomposition L = Si -|- §2 o T, and any EW can be written as [43]: 

ty = (I®L)(4f) = P + (I®T)Q (4.18) 

Where P and Q are nonnegative operators. These EW receive the 
name of decomposable EW. For P = and Q = I one gets the PPT 
criterion. Nevertheless, for higher dimensions there exist EW which are 
not of the form 4.18. A consequence is that there will be entangled 
states for which {W, g) > 0. These states are called Positive Partial 
Transposed Entangled States (PPTES), and it was shown in [44], that 
this kind of entangled states cannot be used in distillation procedures. 
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For this reason, the entanglement contained in them is cahed Bound 
Entanglement, since it cannot be extracted for communication tasks. 

Other criteria to detect and quantify Bound Entanglement have been 
proposed, such as non-decomposable EW [45], Schmidt number Wit- 
nesses [46], Robust Semidefinite Programming [34] and, more recently, 
a geometric approach based on separating hyperplanes [47]. We will 
pursue this geometric interpretation of entanglement in next section. 



4.2.2 Quantification 

There exist several candidates, depending on which criterion one takes 
as more fundamental. Some of them have operational definitions and 
some have not. 



Entanglement cost, Ec It quantifies how many maximally entangled 
pairs are needed to generate a given entangled state, minimized 
over all possible dilution protocols: 



Ecio) = min lim — (4.19) 

Tlocc m^oo m 

where n < m is the number of maximally entangled pairs, Q^^, 
whose entanglement is diluted into m copies of the original state, 

Distillable entanglement, Ej^ It is a measure of how many maxi- 
mally entangled pairs can be obtained by performing an optimal 
distillation protocol to an asymptotic number m of copies of the 
given state: 



ft 

Ed(q)= max lim — (4.20) 

Tlocc m 



with n and m as before. 



Relative entropy of entanglement, Er Analogously to its classical 
counterpart, Er ([48]) can be thought of as a measure for the 
extent that one can confuse two probability distribution, result 
known as Sanov's Theorem (see [1]). But this case, it quantifies to 
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which amount an entangled state can be taken as separable. The 
relative entropy of entanglement is: 



Entanglement of formation, Ep Any density operator has a non- 
unique convex decomposition of the form g = YlkPk&k, where 
Qk are rank one density operators. Its entanglement of formation 
is the averaged von Neumann's entropy of those pure states, min- 
imized over all possible convex decompositions: 



As for the detection case, it is not known whether this measures are 
equivalent. The values of this quantities are only known for some cases. 
In the 2x2 case, the entanglement of formation can be exactly computed 
thanks to a measure known as concurrence[4Q\. For any entanglement 
measure E{q), Enig) < E{g) < Ec{g)- For bound entangled states, 
this is trivially satisfied. 



4.3 Geometric Insights into Entanglement 

The set of all density operators is a convex set, which follows from 
probability arguments. Mixing cannot increase entanglement, hence the 
set of separable density operators is also convex. We will denote this 
two sets by D and §, respectively. 

It is easy to see that entanglement witnesses constitute hyperplanes 
which split D into two subsets, one of which strictly contains S. Let 
W = W^ heajL EW, then: 



Er = mm S{g\\a) 



mm Tr{g log g — g log a) 



(4.21) 




k 



(4.22) 



{W, g) < 0, for some geD\$ 



(4.23) 



{W, (j) > 0, V(T G S 



(4.24) 
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Figure 4.1: Set of composite states D. § is a convex subset containing 
all states that exhibit classical correlations. D \ § contains all states 
exhibiting quantum correlations. For a given density operator in T>, 
its distance to the separable set is a measure of the entanglement that 
it contains. An Entanglement Witness will separate S from a convex 
subset of 2) \ 8. 

Clearly this defines a hyperplane dividing D: {W, a) > {W, g) . In 
our notation, an EW is optimal if (4.23) holds for the largest number of 
Q^s. Intuitively, such an EW will be tangent to S (see fig 4.1). We will 
illustrate this fact in a moment. 

A geometric measure of the entanglement contained in a state is its 
distance to the set of separable density operators. The distance of a 
density operator g to the separable set S is: 

D = min - o-|| (4.25) 

4.3.1 Duality between Detection and Quantification 

It is a general result from geometric optimization that the problem of 
finding a separating hyperplane between a point p and a convex set C 
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is dual to the problem of finding the distance between C and p [50]. In 
matrix space language, this duality can be illustrated as follows. The 
problem (refdistance) can be expressed as: 



min ||r|| (4.26) 
such that Q — a = T 
cr G § 



with variables r and a. The Lagrangian of (4.26) is: 



!i = \\r\\ + {W,Q-a-T) (4.27) 

where W is the Lagrange multiplicator associated to the equality 
constraint. It is not hard to see that it represents a a hyperplane. Noting 
that {W,t) < ||VF||||t||, the dual function can be written as: 



g{W)= min[(W,^) - (W",(7) + ||t||(1 - IIW^II +(5)] (4.28) 

where the parameter 5 > is related to the relative orientation 
between the hyperplane represented by W, and the line going from the 
separable set S to the density operator g, and it is equal to zero if and 
only if they are perpendicular. For (4.28) to be bounded from below in 
r, the additional constraint \\W\\ —S<1 must be included. So the dual 
problem of (4.26) is: 



max [niin[(W-,^)-(W-,a)]] (4.29) 

i5<i cres 

It is straightforward to check that the optimal value of (4.29) is 
attained if and only if W is an optimal EW. This result, also known 
as the Bertlmann-Narnhoffer-Thirring Theorem (sec Rcf. [51]), will let 
us trace a link between the entanglement detection and quantification 
problems (compare also with Refs. [52] [53]) 



60 



4.3 Geometric Insights into Entanglement 



4.3.2 Ellipsoidal Classification 

The basic premise of this method is that the set of separable states S can 
be approximated by a Minimum Volume Covering Ellipsoid (MVCE) of 
an ensemble of vectors corresponding to some separable density oper- 
ators. Then, the following classification scheme can be adopted: if a 
vector falls inside the MVCE, it will be taken as separable, and if it falls 
outside, it will be regarded as entangled. 

An ellipsoid centered at Xc can be expressed as: 

8 = {x|(x-xef^(x-xc) < 1} (4.30) 

where A = is a positive definite matrix of dimension — 1. The 
volume of this ellipsoid is proportional to det{A''^/'^). 

Since in matrix space quadratic forms are not defined, one needs to 

work in a real vector space to build this ellipsoid. We first obtain an en- 
semble of "separable vectors" by means of tensorially multiplying states 
along all directions specified by some canonical basis. For instance, in the 
2x2 case, this ensemble would be {xf^} = {(l,0,0)(g) (1,0,0), (1,0,0)® 
(-1,0,0), (1,0, 0)0(0, 1, 0), (1,0,0)^(0, -1,0), (0,0, -1)0(0, 0, -1)} 
. Later on we will see that it is convenient to vary the norm ||x|^''||2 of 
these vectors. This procedure ensures that all vectors will lie as spaced 
as possible in the separable set S. Secondly, we minimize the volume of 
an ellipsoid, constrained to have all generated "separable vectors" falling 
inside it. One way to obtain the MVCE of this ensemble would be to 
solve the following problem: 



min logdetA"^ (4.31) 

such that (xf P - Xc)^A(xf f - Xc) < 1 

with variables A and Xc. Here, logarithm was taken in order to 
drop off proportionality terms. Despite the exponential growth of the 
dimension of the associated vector space, interior point methods used 
for minimization still converge polynomially to a solution in dimension 
as large as 1000, or more [50]. 
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Figure 4.2: The vertices of the polytope are the generated "separable 
vectors" of which the MVCE is found. The larger set corresponds to the 
whole space of density operators 

4.3.3 Results for 2 x 2 and 2x3 Systems 

The separability problem is solved for 2 x 2 and 2x3 systems, thanks 
to the PPT criterion. One can use this fact to benchmark the method. 
The original problem (4.25) casted as: 



which gives the true results. The second problem is to find the 
MVCE through (4.31), and compute the distance to this ellipsoid in a 
similar way: 



where r and s stand for the vectorized counterparts of g and a. The 
results obtained for pure vectors (11x^*^^112 = 1) are rather discourag- 
ing: whereas none of the generated "separable vectors" fell outside the 



min ll^* — ctIIf 
such that a'^^ > 



(4.32) 



min ||r — s||2 

such that (s — Xc)'^A(s — Xc) < 1 



(4.33) 
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2x2 Syste 


ms 


Norm 


False Positives 


False Negatives 


0.1 


962 





0.2 


868 





0.3 


687 





0.4 


484 





0.5 


287 


15 


0.6 


180 


184 


0.7 


92 


410 


0.8 


32 


600 


0.9 


5 


755 


1.0 





873 



Table 4.1: Number of misclassified vectors in a sample of 1000 "separable 
vectors" and 1000 "entangled vectors" , as a function of the Euclidean 
norm of the vectors of the separable ensemble 

MVCE, only 12.7% of the "entangled vectors" are detected. However, 
the ellipsoid can be shrunk by reducing the norm of the generated en- 
semble {x,^'^*'}. At the expense of letting some "separable vectors" fall 
outside the ellipsoid, the number of correctly classified "entangled vec- 
tors" increases. The event that a true "separable vector" falls outside 
the MVCE will be a false positive, while if an "entangled vector" falls 
inside the MVCE, it will be false negative. Stepwise reducing the norm 
of the vectors belonging to the separable ensemble Tables 1 and 2 are 
obtained. 

There is a trade-off between the number of correctly classified states 
and non-ambiguousness of the test. The relevant area of 2 x 2 systems 
is between norms ||x|^^||2 = 0.6 and ||x,*''^^||2 = 0.5, as can be seen in 
Fig. 4.3. A measure of entanglement ought to be as unambiguous as 
possible, and thus the best choice is ||x|'^^||2 = 0.5, since for this case 
only about 1.5% of the "entangled vectors" are misclassified. For this 
choice, in general, a vector will be misclassified 15.1% of the time. For 
2x3 systems (see fig. 4.4), the MCVE approximates somewhat less 
efficiently the separable set. However, still 76.8% of the vectors are 
correctly classified, in the area comprised between ||x|^^||2 = 0.5 and 
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2x3 Syste 


ms 


Norm 


False Positives 


False Negatives 


0.1 


949 





0.2 


812 





0.3 


597 





0.4 


427 


52 


0.5 


269 


196 


0.6 


160 


389 


0.7 


80 


572 


0.8 


34 


699 


0.9 


11 


807 


1.0 





900 



Table 4.2: Number of misclassified vectors in a sample of 1000 "separable 
vectors" and 1000 "entangled vectors" , as a function of the Euclidean 
norm of the vectors of the separable ensemble 

||x|^^||2 = 0.4. In these systems, it misclassifies at least 5.2% of the 
"separable vectors" . 



4.3.4 Pseudo-Entanglement Witnesses 

For a vector space endowed with the Euclidean norm, there is a simple 
way to construct a tangent hyperplane to a given ellipsoid. We can use 
this fact to build realistic observables amenable to a laboratory setting. 

The tangent hyperplane to the ellipsoid can be expressed as: 

Vx[(x - x,)^^(x - xe) - lU{r - So) = (4.34) 

where Sq = -P£(r) is the projection of the vectorized density operator 
under study onto the MVCE. It can be expressed in affine form as: 

(so - ^cfA{r - xc) = 1 (4.35) 
(compare with Ref. [54]). It is important to keep in mind that. 
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0.8 




FALSE NEGATIVES 



Figure 4.3: False Negatives versus False Positives for 2 x 2 systems, 
showing that there exists an area where the probability of wrongly clas- 
sifying a vector can be brought down to 15.1%, between ||x^'^^||2 = 0.6 
and ||x^^^||2 = 0.5 

although the hyperplanes introduced in (4.35) very much resemble an 
Entanglement Witness, they are not so in general. This is because the 
MCVE may in general be a proper subset of the separable set S, and 
no tangent hyperplane to this MVCE will strictly separate S from any 
entangled state. Nevertheless, these Pseudo-EW can be used to estimate 
the amount of entanglement contained in a given entangled matrix q via 
(4.29), which at the optimal value will be equal to (4.26) [50]. For an 
illustration of entanglement estimation see fig. 4.5. 

4.3.5 Bound Entanglement Detection 

For composite systems of dimension higher than 6, there is a special 
kind of entangled states that cannot be used, in principle, to enhance 
communication. The entanglement contained in these states cannot be 
distilled to obtain pure entangled states [44], and it receives the name 
of Bound Entanglement (BE). The PPT criterion fails to detect this 
kind of states, and it just becomes a necessary condition for quantum 
correlations to arise. Other criteria to detect and quantify BE have 
been proposed, such as non-decomposable EW [45], Schmidt number 
Witnesses [46], and, more recently, a geometric approach based on sep- 
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FALSE NEGATIVES 



Figure 4.4: False Negatives versus False Positives for 2x3 systems. The 



error probability can be reduced to 23.2%, between ||x^ ||2 = 0.5 and 
llxf^llo = 0.4 



arating hypcrplancs [47]. 

The MVCE approach is in the spirit of the latter of the aforemen- 
tioned methods, but instead of hyperplanes, we shall use the MVCE in 
order to detect BE. Intuitively, the ellipsoid covering a set of "separable 
vectors" should leave bound entangled states on its outside. This fact 
is studied in 3X3 systems, where a parametrization of bound entangled 
states, due to P. Horodecki, is available [55]. These states qbe depend 
on a scalar a G [0, 1], and are given by: 
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Surprisingly, for norms of the generated separable ensemble of 0.6 
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Figure 4.5: For 100 random "entangled vectors" of a 2 x 2 system, the 
continuous black line is the true distance to the separable set S, while 
the dashed line stands for the distance of the vectors for a MVCE of 
"separable vectors" of norm ||x|^^||2 = 0.5. At the bottom, the pointed 
line represents the distances obtained for norm ||x|^^||2 = 1 

and below, all bound entangled states are detected. The obtained results 
are shown in Table 3. 

As expected, the distance to the MVCE of the detected states lin- 
early depends on the norm of the associated density operator. This 
interdependence is depicted in Fig. 4.6 
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3 


X 3 Systems 


Norm 


Detected States 


0.1 


1000 


0.2 


1000 


0.3 


1000 


0.4 


1000 


0.5 


1000 


0.6 


1000 


0.7 


226 


0.8 


149 


0.9 


107 


1.0 


79 



Table 4.3: 1000 bound entangled states were generated, with parameter 
"a" running from 0.001 to 1. The distances of the associated vectors 
to the different MVCEs were obtained. For norms of the separable 
ensemble of 0.6 and below, all bound entangled states were detected 



UJ 

o 



Q 



DISTANCES FOR 
||X=»P|| =0.1 




DISTANCES FOR 

||X=*||j = 1 



0.4 0.5 O.S 0.7 0.8 0.9 1 

NORM OF BOUND ENTANGLED DENSITY OPERATOR 



Figure 4.6: There is a linear dependence between the distance to the 
MVCE and the norm of the associated density operator 
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Chapter 5 



Classical Information over 
Quantum Channels 

Quantum Information Theory harbors the possibility of enhancing com- 
munication with the help of an genuinely quantum resource: quantum 
entanglement. Entanglement thus should be regarded as new kind of in- 
formation. Hence it is natural to expect that different capacities can be 
defined, depending on which kind of information (classical or quantum) 
is to be sent over a channel. 

The classical capacity C is the asymptotical rate at which classical 
information can be transmitted through a quantum channel. Depending 
on whether we allow for quantum or classical coding and decoding, the 
classical capacity unfolds into four different capacities [56] (see fig. 5.1). 
We will obviate this fact and consider only a general C, which will be the 
largest capacity among all coding and decoding possibilities. There is 
another classical capacity, called entanglem,ent- assisted classical capacity 
Ce- It is the rate at which classical information can be sent over a 
quantum channel provided that sender and receiver share an unlimited 
amount of entangled pairs. We will see that this capacity is larger than 
C. 

There is also a quantum capacity Qi for transmitting intact quan- 
tum general states. Still there is a classically- assisted quantum capacity, 
Q2, for transmitting intact quantum states in parallel with a classical 
feedback channel, which permits sender and receiver to perform coor- 
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dinatcd local operations to fight noise. The characterization of these 
two quantum capacities is important because it determines how much 
entanglement can be conveyed through a channel. 

Here we will focus only on the former capacities, C and Ce, since 
they dwell more in the spirit of classical information theory, and are also 
better understood. The results of quantum source coding [57] [58], will 
also not be treated here for conciseness 



5.1 Quantum Asymptotic Equipartition Prop- 
erty 

In analogy with i.i.d. processes in Chapter 2, it is possible to derive 
typicality results for quantum systems. Suppose that a device outputs 
a system in state ipi with probability pi, with fi necessarily orthogonal 
rank one density operators^. The entropy of the system will be S{g = 
YliPifi) = H{p). Now consider the m-fold tensor product (fi = (fi^ (g) 
(fi.^ ... (pijji, and call it sequence density operator. The eigenvectors 
of this sequence density operator live in the space JC®™ = JCi (8> 3^2 
. . . (g) Denote by pj = Pi^Pi^ ■ ■ -Pi^ the product of all probabilities 
corresponding to a given sequence. The sequence ipi will be typical if: 

|_llogp,-5(^)| <e,Ve (5.1) 
m 

Likewise, it is possible to define the typical suhspace of !K®™ as the 
subspace spanned by the eigenvectors of all typical sequences. The 
orthogonal projector onto this subspace, IIa, has the following proper- 
ties: 



(^/,nA)>i-5 (5.2) 



{ipiJli)<5y5 (5.3) 

that is, for sufficiently large m, almost all the probability is contained 
in the typical subspace. The dimension of the typical subspace will be 



'^This can always be done by diagonalizing the density operator 
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bounded by: 



(5.4) 



or equivalently, 



(5.5) 



which means that as m grows, all typical sequences will tend to be 
equiprobable. 

5.1.1 Entanglement Distillation 

These typicality results will allow us to prove Theorem 11. For this we 
will state the following lemma, whose proof can be found in [40] : 

Lemma 1 A bipartite pure state ip^^ can be transformed into another 
pure state by LOCC if and only if the eigenvalues of their reduced 
density operators satisfy A((/?"^) -< \{q^) 

Proof 14 (Proof of Theorem 11) Suppose we have 

that its reduced density operator is (TrB^p^^)^"'' = "^jPifi- As m 

grows, the eigenvalues of this density operator will satisfy eg. 5.5. 

The reduced density operator of a maximally entangled state has max- 
imum entropy S{TrBQ^f) = log 2 = 1. Now consider n copies of the 
maximally entangled state, (g^^)^"'. As n grows, its eigenvalues of 
{TrBQ^f)®^ will also satisfy eq.5.5, so they will be constrained to take 
values arbitrarily close to 2~". 

We have the inequality: 



and by previous lemma, if the entropy S{TrB^ ) ~ ^, then it will 
be possible to transform m copies of the arbitrarily entangled state (p^^ 



(5.6) 
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into n copies of Q^^ ■ This argument can be carried out symmetrically 
considering the other subsystem. 

5.2 Quantum Channels 

A quantum channel C is a map from one algebra into another: 

e:2l — >^ 

Classically, a channel induces some noise due to the stochastic na- 
ture of its associated transition matrix. In Quantum Mechanics time 
evolution of a closed system is completely deterministic. However, the 
system will generally couple to unaccessible degrees of freedom corre- 
sponding to dynamical variables of its environment. At the end, only 
the system will be observed. Tracing out the environment can introduce 
noise in the resulting density operator. The effect of a channel C onto g 
can be expressed as: 

e{g) = TrE{U{g(E)gf)U^) (5.7) 

where gf is the initial pure state of the environment, and U is some 
time evolution operator acting on the global algebra, which may, or may 
not, couple the system and its environment. Some physical requirements 
are: 

• It should be a completely positive map. This stems from the fact 
that if g^^ is the state of a composite system, and only one of 
the subsystems is sent over the channel, the result still should be 
a density operator. This has profound consequences, such as the 
duality between channels and entangled states. 

• It should be a trace preserving map, Tr{Q{g)) = 1. This is the 
demand that any POVM on the channel's output is normalized to 
one^. 

^Non-trace preserving maps are interesting to describe measurements as a channel 
from algebra of quantum systems to the algebra of classical systems. Non-trace 
preserving maps are also interesting when for some reason there is a probability 
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• It should be a convex-linear map, QCl2,iPiQi) = YliPi^iQ'd- 
other words, channel effects should be regardless of the convex 
decomposition of the input density operator, pretty much in the 
same way that classically a transition matrix doesn't depend on 
the input probability distribution. 

It turns out [9] that this requirements are necessary and sufficient to 
come to the operator sum representation of channels: 

N 

e{e)=^AgAl (5.8) 

i=l 

with N < dzm(2l)(iim(Q5) and Ai : —>■ are called Kraus 
operators. The cannel is trace preserving, hence: 

TV 

Y.A\A, = 1 (5.9) 

i=l 

Consider an orthogonal resolution of the identity {gf = |ei)(ei|}^j^ 
for the environment''^ and suppose for simplicity that g = |x)(x|- Time 
evolution will correlate the state with its environment. Since the en- 
vironment cannot be measured, this correlation cannot be exploited. 
Tracing out the environment it is easy to check that: 

rrE(U(|x)(x|®|ei)(ei|)Ut) = 
= E(ed[U(|x)(xl®|ei)(ei|)Ut]|e,) = 

i=l 
N 

= (5-10) 

i=l 

with Ai = (ei|U|ei). There is a straightforward interpretation of 
eq. 5.8. Prom linearity and the assumption that the channel is trace 

leakage in the channel, i.e. when sometimes the channel produces no output at all, 

so Tr{e{g)) < 1 

'^Here we assume implicitly that at most dimensions are necessary to model 
the environment for one system of dimension d 
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preserving, Tr{AiQAl) = pi is the probability that the channel outputs 

AoA^ 

Qi. Using Bayes' rule we have that Qi = — , so: 

Tr(AigAl) 

e{Q) = J2pi^i (5-11) 

i 

Thus, the stochastic behavior of the channel comes from the interac- 
tion of the sent system with its environment. In fact, for ideal channels, 
i.e. for cases where the system and its environment don't interact, evo- 
lution is unitary and only one Kraus operator is needed to define the 
channel. However this is not of much relevance, since the mere fact of 
observing a system can be described as a highly non-ideal channel(see 
2.3.1). 



5.3 Classical Capacity of a Quantum Channel 

In a classical communication setting, the inference capability of the re- 
ceiver is related to the mutual information between the sender and re- 
ceiver's probability distributions. This capability will depend on the 
nature of the channel: for a noiseless channel the mutual information 
attains a maximum /(p^; p^) = H{p-^), with p^ = /(q^). For a noisy 
channel, mutual information can be maximized by finding the optimal 
probabilities of the input code. 

For some reason, one might want to use quantum states to encode 
classical bits. What makes this scenario interesting is that the quantum 
states states may not in general be orthogonal. In fact, this can be 
desirable in some cases [59] [60]. Then, the Holevo's bound (see section 
2.3.3) is telling us that the maximum accessible information is bounded 
by: 

n 

I{X : Y) < Sig"") - ^pf 5(^0 (5.12) 
1=1 

This suggests that, in contrast to classical channel coding, quantum 
channel coding demands that two optimizations be carried out to find 
the optimal performance of a channel. First, the optimal measurement 
strategy that maximizes 5.12 must be found, this is a search in the set 
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Figure 5.1: Quantum Channel. Two classical bits are encoded in four 
qubits and sent over the channel. Channel inputs can be entangled or 
not. The measurement at the decoder usually is over the joint qubit 
sequence. 

dJt. In second place, the optimal input probability is to be found in the 
classical probability simplex just as in the classical case. We can 
now define the classical capacity of a quantum channel as: 



C = max max /(X : Y) 



M 



(5.13) 



such that M(£) ) = q . A direct consequence of the No-Cloning 
Theorem (see 2.3.2) is that, when the system appears at the receiver's 
side, it must have disappeared at the sender's side. Since the random 
variables X and Y are directly related to the same system at dif- 
ferent times, a joint probability distribution is lacking a true consistent 
meaning. Hence, the concept of mutual information is meaningless, as 
long as it refers to the information that one systems contains about itself 
prior to having been sent through a channel. As we will see, this sub- 
tlety precludes the use of joint typicality arguments in coding theorems 
of quantum channels. 

The optimal measurement strategy is given by the Holevo's bound, 
so we have: 



C 



max S(p 



(5.14) 
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Following [61] [62] [63], we state a coding theorem for quantum noisy 
channels: 

Theorem 12 [Holevo-Schumacher-Westmoreland Theorem] A quantum 
noisy channel C : 21 — > 53 can he used to transmit information reliably 
if and only if R < C, with capacity defined as: 

n 

C = max5(e(£>^)) - y^pfS{Q{Q^)) (5.15) 
P i=i 

where Qi are the input states and ai = Q{Qi) represent the states to 
be measured by the decoder. 

Proof 15 (=^>Proof of Achievability) Suppose that the message w = 
(zi,Z2, ■■■,im) is to be sent. The sender will construct = Qi^ ® qi^ ® 
■■■■ ® Qim '^^'^ channel's output will be = C®™'(£»u,). 

The probability of successfully identifying aw is {aw, Mw), where Mw 
is a measurement for index w. An error will be declared with probability: 

p, = 2-^^Y.^l-{aw,Mw)) (5.16) 

UI = 1 

Classically, one could resort to joint typicality arguments to build a 
proof. Since quantum physics prevents us from considering the mutual 
information of two distributions that exist at different times, we cannot 
follow this way. Instead we will consider two different applications of 
typicality, one concerning which sequence density operators will be more 
likely (much like in the classical setting), and the other sort of quan- 
tifying how many sequences can be considered to be "close" to a fixed 
sequence density operator. 

Let a = Pi^i average output of the channel with spec- 

tral decomposition a = Y2j Consider the m-fold tensor prod- 

uct 0-®"" = Xj\ej){ej\, with J = (ji, j2, jm)- Define Ua = 



78 



5.3 Classical Capacity of a Quantum Channel 



5]]jgQ-|ej)(ej| OS the projector onto the typical subspace A<j G iJ®"^ 
spanned by the eigenvectors of all typical sequence density operators: 

7={J : 2-"^(^(*)+«) < Aj < 2-™(^(*)-^)} (5.17) 

Then: 

{a^"',UA)>l-d (5.18) 

Now, let be the output sequence of the channel. It has a spectral 
decomposition: 

<^v, = Y.>^l\e'!j){e^j\ (5.19) 
J 

Gy, will be the tensor product of about mpi copies ofai, mp2 copies of 
(72, and so on... Define the average per symbol entropy of the sequence 
as S{aw) = ^iPiS{(Ji) . Interchanging the two definitions of entropy, 
we build the projector Jlw = ^j^-j^ |ej)(ej|, where: 

7y, = {J ■ 2-"»(^(<^-)+^) < < 2-"^(^('^-)-^)} (5.20) 

Then: 



K,n^)>l-<^ (5.21) 

Next thing to do is to define the POVM associated to the decoder 
M = {Mu,}^"^^. Each component of the POVM should be very close to 
the typical projector Jly^. Since only typical sequences will be assigned a 
codeword: 



My, = (^nAn^'nA)-inAn^nA(^nAn^'nA)-^ (5.22) 

w' w' 
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which makes sure that no non-typical sequence will be considered in- 
side the typical subspace of the sequence a^- The generalized inverse 
square roots ^ are introduced for normalization. 

Now that the two concepts of typicality are introduced, return to eq. 
5.16: 



w=l J J'gT-w 

W = l Je'J'w J^Tyj 

w=l Je7w J^Tw 

where 

w" 

The first inequality follows from omitting some non-positive cross 
terms and the relation X^jAj = 1. The second inequality comes from 
considering the componentwise inequality (1 — x)^ < (1 + x){l — x) < 
2{l-x),xe [0,1]. 

Once we realize that the Q!(u),j),(u,',j') are the entries of the square 
root of the Gram matrix T = [(ej|nA|ej, )] = ['y(^w,J),{w',J')]) it is possible 
to express first term of the last member in eq. 5.23 as: 



^The operator X 2 is equal to on KerX and equal to (X2 ) ^ on KerX^ 
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2 XI ^j'^^ ~ (^{w,J),{w,J)) 

w=i JeT„ 

= 2Tr(^,^)(A(I-r5)) 

= Tr(^,j)(A(][ - ri)2) + rr(,„,,;)(][ - T) 

< Tr(^,j)(A(I-r)2) + rr(^^^)(I-r) 

= XI ^j['^-^^{w,J),(w,J)+ X Yl i'yiw,J),{w',J')l{w',J'),{w,J))] 

w=l JeTw w'=l J'&Tw 

= X ^^[2 - 3l{w,JUw,J) + liw,JUw,J) + X T(l',J),(«^,J') 



where A = diag{Xj), and 'Tr^^j^j^" denotes the trace with respect 
to this joint index of the Gram matrices, instead of the usual trace over 
the dimension of density operators. Using the fact that 2 — + < 
2 — 2a;, X G [0, 1], we see that that 5.24 upper-bounded by: 



< 2-« XiE ^Ji'^ - H^,J)M + X ^kj)M 

W = l J J'jtJ 

+ X E 7(l,.),K,.o] + X >^J} (5-25) 

Note that we expanded the range of the sum from J E 7^ to all J. 
Some algebra shows that it is equivalent to: 
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Pe < 2-'"^^{2rrK(i-nA)) + rrK(i-nA)n^(i-nA)) 

+ Tr{UAay,UAll^') + Tr{aU^-'ny,))} 
< 2-"^^ ^{3rr(c7^(l - Ha)) + ^ rr(nAC7^nAn^0 + Tr(a^(l - H^))} 

10=1 w'^W 

(5.26) 

here we used that (cj |ej,) = to introduce the tautology Ha = 1 — 1+ 
Ha . The last inequality follows from the fact that Tr (a^ (1 — IIa)!!^ (1 — 

Ha)) <rrK(i-nA)n^). 

Finally, applying the concept of random coding to symmetrize over 
all codewords, we see that: 



Pe = Y.p{e)pe{e) 

e 

e w=i 

J2 TriUAa^^HAUy) + rr(ai(l - Hi))]} 

< 4(5 + (2"*-^ - l)rr(2-"*(^(^)-^)ni/) 

< 4(5 + {2'^^ — i)2~"^('^(^)~^)2"**^'^^^^+^^ (5-27) 

where we used typicality arguments of eqs. 5.4, 5.18, 5.21, and the 
fact that nACT®™nA < 2-™(^('^)-')l. This proves that whenever R < 
S{a) — S, the probability of error goes to zero as m grows. 

Proof 16 (<S=Weak Converse) To prove that if R > S{a) - S, the 
error is bounded away from zero, we will use Fanno's inequality (see eq. 
1.30), as in the classical case: 
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HiWlY"") < mPeR + 1 = mem (5.28) 

Technically, we are going to prove that when the error goes to zero, 
then the rate must necessarily he less than the capacity. Assuming that 
messages are equiprobable: 



mR = 

< 

< 

< 

< 
< 

Second and third inequality follow from the Holevo 's bound and the 
subadditivity of von Neumann's entropy, respectively. Last inequality is 
due to the definition of the capacity, since all terms in the sum are no 
greater than the capacity as defined in eq. 5.14- Thus, we proved that 
if R > C as m, then the error must be bounded away from zero as m 
grows. 

For trace preserving channels (the ones being considered here) , it was 
found that transmitting entangled states does not increase the capacity 
[64]. 



^(P^) 

I{W,Y'^) + H{W\Y"') 
I{W,Y'^) + me^ 



2mR ' / J 2"^^ 



mer. 



^mR 2^mR 
U>=1 w = l 

mC + mem 
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5.4 Entanglement-Enhanced Classical Commu- 
nication 

The main goal of this dissertation was to argue that entanglement can 
be used to increase the classical capacity of information transfer. As an 
example, consider the Quantum Erasure Channel (QEC). The QEC is a 
map from an algebra of dimension N to an algebra of dimension A'^ + 1. 
It maps an input state to itself with probability 1 — e. With probability e 
the channel maps its input state to an erasure symbol state, orthogonal 
to all input states. For the qubit case, the QEC would take as input 
states ^0 = |0)(0|, = to = 10)(0|, = |1)(1|, and g2 = |2)(2|, 

with (0,2) = (1,2) = 0. The classical capacity of this binary erasure 
channel is given by [1]: 

C = 1 - e (5.29) 

It was already shown in 3.4.2 that sharing a maximally entangled 
pair permits to send two classical bits encoded in just one qubit. If the 
sender and receiver share an unlimited amount of maximally entangled 
pairs, it is possible for the sender to pre-process its entangled subsystem 
in such a way that the total entropy of the state will be: 

St = S{TrBQf ) + log2 (5.30) 

where the first term is the von Neumann's entropy of the reduced 
density operator, which is at a maximum for EPR pairs, and zero for 
separable states. The second term is due to the choice of sender between 
Qo and Qi^. It will always be larger than the entropy of any classical 
state. The Ce of the QEC is given by [65]: 



C = 2(l-e) (5.31) 

Now we come to the point where it is possible for us to state that 
[66]: 

The entanglement-assisted classical capacity will be alvi^ays 
lEirger than the unsissisted classical capacity. 



'^It assumes that both symbols are equiprobable, which maximizes capacity. 
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PRO°BABILITY OF ERROR 



Figure 5.2: Both Capacities C and Ce versus the probabiHty of error, e 
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Conclusions and Final 
Words 



I feel somewhat ashamed of not having included important topics such as 
quantum source coding, or the duality between channels and entangled 
states. Moreover, at early stages of this thesis I intended to include, as 
well, an introduction to multiple user quantum information theory, so 
at the end this work has fallen short of what it was meant to be. A 
decision was made on a basis of time budget, and I hope that this lack 
will not preclude a self-contained read of the text. 

Besides the aforementioned subjects, I would find it interesting to 
study more in depth some other topics such as quantum rate distortion 
theory, quantum signal processing or quantum cryptography. However 
this would demand too much more efforts than those expected in a 
master's thesis. 

Remarkably, the most technical part of this work was done in the 
scope of convex optimization, which is not to surprise anyone in the 
Information Theory community. A method for classifying entangled 
and separable states based on a Minimum Volume Covering Ellipsoid 
was devised by myself (to my knowledge, no one had done this before) 
for a class project. In a sense, this constitutes the state-of-the-art part 
of the thesis. At the time of writing, a document explaining the method 
can be found in the arxiv.org database. 

A word is to be said about my previous knowledge of Quantum 
Information Theory, concerning the fact that I first read a paper on 

this subject exactly one year ago. Since then, most of my efforts have 
been aimed at reaching a positive semidefinite level of expertise in this 
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field. Thus I would say that my contribution amounts to a compilation 
of the knowledge which I judged essential to understand the role of 
entanglement in classical channel capacities. 
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